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If gamma-ray bursts are at cosmological distances, they must be gravitationally lensed occasion- 
ally [2]. The detection of lensed images with millisecond-to-second time delays provides evidence 
for intermediate-mass black holes, a population which has been difficult to observe. Several studies 
have searched for these delays in gamma-ray burst light curves, which would indicate an intervening 
gravitational lens [3} [4 [5) [6]. Among the ~ 10* gamma-ray bursts observed, there have been a hand- 
ful of claimed lensing detections [7], but none have been statistically robust. We present a Bayesian 
analysis identifying gravitational lensing in the light curve of GRB 950830. The inferred lens mass 
depends on the unknown lens redshift z,, and is given by (1+ z,)M, = 5.5453 x 10* Mo (90% credibility), 
which we interpret as evidence for an intermediate-mass black hole. The most probable configuration, 
with a lens redshift z ~ 1 and a gamma-ray burst redshift z; ~ 2, yields a present day number density 
of = 225, x 10° Mpc? (90% credibility) with a dimensionless energy density QyypBH ~ 4.61938 x 10-4. 
The false alarm probability for this detection is ~ 0.6% with trial factors. While it is possible that 
GRB 950830 was lensed by a globular cluster, it is unlikely since we infer a cosmic density inconsistent 
with predictions for globular clusters Ngo 7% 8 x 107° at 99.8% credibility. If a significant intermediate- 
mass black hole population exists, it could provide the seeds for the growth of supermassive black holes 
in the early Universe. 

The evidence for a cosmological population of intermediate-mass black holes (IMBHs) is mounting. They have long 
been posited to reside in the cores of globular clusters. Dynamical friction in stellar clusters causes the most massive 
stars to sink to the bottom of the cluster’s gravitational potential. Since 2004, simulations have indicated that, for 
small compact clusters, stellar mergers happens within the lifetime of giant stars [8]. Critically, these mergers occur 
before the stars go supernova and disturb the system, leading to a runaway collision and the formation of a ~ 10° Mo 
megastar. These short-lived monsters could seed IMBHs, which subsequently grow through accretion and mergers. 
Yet direct observational signatures of their existence are elusive. 

Their large mass puts the majority of IMBH mergers outside the sensitivity range of the current generation of 
gravitational-wave detectors. The Advanced Laser Interferometer Gravitational-wave Observatory (LIGO) [9] and 
Virgo are sensitive to mergers with a total merger-product mass of S 400 Mo. Furthermore, IMBHs are too small 
to be observed using the same techniques employed to detect supermassive black holes in galactic nuclei. They are 
either not massive enough, or live in a state of starvation, unable to accrete enough gas to power quasar-like emission. 

Astronomers are converging on the population of IMBHs from both ends of the black hole mass spectrum. Com- 
pact object mergers detected by LIGO-Virgo are uncovering a population of black holes edging closer to the range 
traditionally reserved for IMBHs [11], including the recent discovery of a 150 Mo merger product [I2]. From the other 
end, the lower limit on supermassive black holes in the nuclei of dwarf galaxies is descending [13]. There have been 
recent findings of compact objects with mass 104 — 10° Me residing in galactic cores [14] [I5]. Observations of a tidal 
disruption event from the evisceration of a star by a black hole’s tidal field suggest an IMBH resides in a star cluster 
on the outskirts of a barred lenticular galaxy [16]. In this Letter, we provide evidence for an IMBH using gravitational 
lensing. Gravitational lensing is one of the few ways to directly constrain IMBH population statistics by providing an 
estimate for the number density of IMBHs. 

In strong gravitational lensing, photon paths from a background source are distorted due to curved spacetime, 
producing multiple images. The relative fluxes and the difference between arrival times for each image can be used to 
infer the gravitational structure of the lens. In the case of a compact lens, the mass can be directly determined up 
to a redshift factor. The fraction of distant sources which experience multiple-imaging is directly proportional to the 
dimensionless energy density of compact lenses, Qiens = Plens/Pc where flens is the energy density of lenses while pe 
is the critical energy density required for a flat Universe. This fraction is independent of the lens mass, mjens. Strong 
lensing is accompanied by an overall magnification, typically a factor of a few in flux. This allows us to probe more 
distant sources, or sources which would otherwise be too faint to detect. 

Gamma-ray bursts (GRBs) are extremely luminous bursts of 7-rays, with peak energies of 100 — 300 keV. They are 
thought to be generated by the rapid infall of material onto a nascent stellar mass black hole, formed through either 
a collapsar supernova or a compact object merger. Some of the accreting material is launched in ultra-relativistic, 
bipolar jets along the rotation axis. A fraction of this outflow is converted into electromagnetic radiation, which is 
Lorentz-boosted into y-rays. The cosmological nature of GRBs is well established by both the isotropy of observed 
events [18] and redshift measurements of their optical afterglow [19]. A cosmological origin implies that at least some 
fraction of the GRB population must be strongly lensed [I]. 

Gamma-ray detectors, unlike those used for optical or infrared astronomy, have comparatively poor angular reso- 
lution, but good temporal resolution. Thus, we do not expect to resolve a gravitationally lensed image pair in y-rays. 
However, a time delay between the two images, resulting from the differences in geometric path and relative differ- 
ences in gravitational field strength, can be observed. The photons which travel a longer distance arrive first, as the 
shorter path traverses deeper into the gravitational potential well of the lens where time dilation is stronger. The 
gravitationally retarded image is dimmer than the first image. The observational signature of such an event is thus 
an initial y-ray pulse followed by a duplicate “echo.” The duration of the time delay between the burst and the echo 
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is predominantly determined by the mass of the gravitational lens, but also by the alignment of the y-ray source with 
respect to the observer-lens line-of-sight. For a point-mass lens [20] [21] [22], 


At (r—-1 =i 
(1 +2z2)M = E (=F +r) : (1) 
Here, At is the time delay, r is the ratio of the fluxes, and (1+ ~)M is the redshifted lens mass. By measuring At 
and r we can infer the redshifted mass (1 + z)M. 

The total number of observed GRBs is of order 104. We analyse the BATSE dataset as it is the largest available 
single dataset at ~ 2,700 bursts. We include both long and short GRBs in our study. For a burst and echo to 
occur within the same BATSE light curve, we require a time delay of < 240s. The minimum detectable time delay is 
determined by the width of the y-ray pulse; if the delay time is too short, the two images merge into one. For long 
GRBs, the minimum detectable time delay is ~ 1s, and for short bursts, it is ~ 40ms. This range of time delays 
corresponds to a lens mass range of approximately 10? — 107 Mo [23) [21]. 

We identify preliminary lensing candidates with an auto-correlation analysis [24 [7]. We utilise the four available 
broadband energy channels of BATSE burst data independently. The equivalence principle dictates that all wavelengths 
of light are equally affected by gravitational fields. This implies two constraints: the time delay is independent of the 
photon energy, and the gravitational magnification of each image is identical for every wavelength. Once we have 
identified candidates, we employ Bayesian model selection to determine the Bayesian odds comparing the lensing 
hypothesis to the no-lensing hypothesis. Our unified framework simultaneously provides the detection significance 
while estimating the lensing parameters, which we use to infer the lens mass. To model GRB pulses, we employ the 
fast-rise exponential-decay (FRED) model [25]. Details are provided in the supplementary material. 

We uncover one statistically significant gravitational lensing candidate: GRB 950830 (BATSE trigger 3770)—a 
short y-ray burst. The light curve for this burst is shown in Fig. [1] with the reconstructed curve of the best-fit model 
plotted in black. The black curve is created by taking the mean of the curves drawn by each of the = 60,000 posterior 
sample sets at each time bin. We find that each individual pulse is best fit by a variation of the FRED pulse model 
plus a sine-Gaussian function. We analyse the four available energy channels independently and find that the lensing 
hypothesis is preferred in each channel with In Bayes Factors (In(BF)) between 0.5— 7. Adding the In(BF)s from each 
of the channels, we find the total In(BF) = 12.9 (logio BF = 5.6) in favour of lensing, indicating strong statistical 
support for the lensing hypothesis. A In(BF) of eight is considered “strong evidence” in support of one model over the 
other [26]. Detailed fits are shown in Extended Data Figs.1-6, including an example of a ‘double’ burst which is not a 
lens (Extended Data Figs. 7-8). 

Assuming a point-mass deflector, the marginalised posterior distributions for time delay and magnification ratio of 
this lensing event in Fig.2 can be used in conjunction with equation to infer a redshifted lens mass of (Fig.3) 


(1+ 2)M ~ 5.5th3 x 10* Mo. (2) 


There are three astrophysical objects in this mass range, which might serve as a lens: globular clusters, dark matter 
halos, and black holes. A gravitational lens is well approximated as a point mass if most of its mass is contained within 
the region bound by the two lensed images where they bisect the cosmological plane of the lens. Taking instead an 
isothermal mass distribution as the gravitational lens, and integrating over all 2%, Zs, we find a lens velocity dispersion 
of ~ 4kms~!. From simulations, this dispersion is associated with an Navarro—Frenk—White (NFW) profile of mass 
~ 10° Me B7]. Globular clusters follow a similar mass—velocity dispersion scaling [28]. In either framework then, 
either a singular point mass, or a self-gravitating isothermal sphere, we have a consistent measurement for the mass. 

Dark matter halos are numerous, and their number density can be calculated using the Press-Schechter formalism. 
However, each has a negligible contribution to lensing cross-section, as NFW mass distributions typically have cores, 
which are not sufficiently massive to produce multiple images. Globular clusters are compact enough to produce 
multiple images, but there are not many of them. Assuming that the Milky Way’s ~ 200 globular clusters are typical, 
and that the Milky Way formed from an overdensity of approximately 20 Mpc?, then the number density of globular 
clusters is approximately 10 Mpc”, giving Qgc(10° Mo) ~ 8x 10-°—-significantly lower than the mean density implied 
by GRB 950830. 

Following [I7], we use the optical depth to estimate the cosmological density, Q ~ 7 ((z,)). Assuming that BATSE 
y-ray bursts have a mean redshift of two, the IMBH energy density is 


Onin (M ~ 104° Mo, (zs) ~ 2) = 4.6423 x 1074. (3) 
The present day number density of IMBHs is 
Nimen (M ~ 10475 Mo) ~ 2.3173 x 10° Mpc“, (4) 


(90% credibility), where we have assumed a lens redshift of z ~ 1. The uncertainty is from Poisson counting statistics. 
Thus, there should be approximately = 4.6t3:8 x 104 in the neighbourhood of the Milky Way. There are approximately 
108 stellar-mass black holes in the Milky Way [29]. Assuming all stellar mass black holes are bound to galaxies, and 
with nga ~ 0.04 Mpc™?, then the number density of stellar mass black holes is stellar ~ 107 Mpc~*. Our result for the 
IMBH density is consistent with the stellar mass black hole density assuming that number density scales as ~ M~!. 
Note that the mean redshift of Swift short GRBs is (zs) ~ 0.8. If the GRBs in the BATSE sample had the same mean, 
then the inferred cosmological density Qımsu would increase by about an order of magnitude. Extended Data Figs. 
9-10 give results at different source and lens redshifts. 

Our estimate for Qrisn is consistent with the null result of other GRB lens searches [6] [30], which are sensitive to 
different lens masses. The Fermi and Konus-Wind catalogues are similar in size to the BATSE GRB catalogue, and 
there is ~ 50% probability that these contain another GRB which is gravitationally lensed by an IMBH. In addition, 
due to the relatively flat GRB luminosity function [23], the uncertainty in nmgy derived from a single lensing event is 
more significant than the potential magnification bias. 

If this detection represents the first determination of the space density of IMBHs, then it may shed light on open 
questions in astrophysics. How are the supermassive black holes that power quasars so massive at high redshift? Are 
IMBHs gravitationally bound to galaxies? Do they have observational signatures in electromagnetic or gravitational 
radiation? What is their relationship to the globular cluster population? Are they the remnants of direct collapse of 
104-6 Mo dark matter or baryonic clouds in the early Universe? The identification of additional lensing candidates 
in the GRB catalogues will confirm this result, and allow a more precise determination of Qimgu- 
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Methods 


In this appendix we provide additional background on our methods. We start with a general overview of GRB lensing 
to place our research within the context of the wider field. From there, we describe our selection method for finding 
gravitationally lensed GRB candidates. We then discuss the statistics of photon counting in y-ray astronomy. We 
construct a Bayesian framework with a model for the lensing signal and y-ray background. We go on to discuss the 
validity and robustness of our results. We include calculations to determine the optical depth to lensing for a source 
population at mean redshift (z,), and provide evidence against the alternative hypothesis, that GRB 950830 was lensed 
by a globular cluster. We derive the uncertainty on our estimate for the number density of IMBH nmsu. We also 
include an estimate of the false alarm probability, both with and without trial factors. Finally, we include a candidate 
identified by the autocorrelation detection algorithm but strongly rejected by our Bayesian analysis for illustrative 
purposes. 


A brief literature review. Gravitational lensing studies of y-ray bursts typically come in one of two flavours. 
There are autocorrelation studies, which search for echoes of the y-ray burst within the same light curve. Then there 
are cross-correlation studies, which compare the light curve similarity of two separate GRB triggers on a per-bin basis. 
These are typically accompanied by positional coincidence statistics, which check if the GRBs have consistent source 
locations. Our study is of the first flavour. 

Traditional strong gravitational lensing refers to the multiple imaging of, e.g., quasars due to galactic-mass gravita- 
tional lenses. Also known as macrolensing, fiducial image separations are on the order of arcseconds, with inter-image 
time delays of days to years depending on the lens geometry and source-lens alignment. Millilensing (or mesolensing) 
is loosely defined as gravitational lensing due to million solar mass objects [3I], and produces time delays of order 
seconds. The conversion between mass and time delay is 


At ~ 50 (M;/10°Mo) seconds (5) 


for a Schwarzschild potential [21]. In essence, millilensing fills the mass range between traditional strong lensing and 
microlensing whereby stars acting either alone or in unison in galaxies [32] [33] [84| produce multiple images with ~ 
microsecond time delays. At the more extreme end, nanolensing [35] (~ 1076 — 107} Mo), picolensing (~ 10-12-1077 
Mo), and femtolensing (~ 10716 — 10713 Ma) describe deflections and interference effects due to planetary and 
sub-planetary mass gravitational lenses. 

Autocorrelation probes millilensing echoes from ~ 10? — 10° Mo gravitational lenses. The minimum lens mass 
is determined by the temporal resolution of the instrument in addition to the variability timescale and duration 
of the burst. The upper mass limit is determined by the instrumental cut-off of data recording after the event 
trigger. Numerous autocorrelation searches of the BATSE database have been done using the summed 64 ms light 
curves [37] [38},[7]. Autocorrelation has been used on the Fermi GBM and Swift BAT catalogues [6], with a null result 
for lenses masses of 101 — 10? Mo. 


Cross correlation studies probe time delays equal to the difference in arrival time of the two bursts. The observation 
of the second image is inhibited by Earth occultation. This sets a minimum observable time delay, since y-ray 
observatories typically have ~ 90min orbital periods. Recent work on the Fermi GBM response shows how the 
observation conditions of a y-ray burst significantly affect the inferred spectrum [39]. The effects of detector angular 
response, energy response, atmospheric scattering, accumulated particle precipitation, cosmic background and Galactic 
sources such as the Sun or Crab Pulsar complicate cross-correlation studies. No lensing studies have looked for GRB 
lensing across multiple observatories due to the inherent difficulties comparing light curves from instruments with 
different energy responses. Cross-correlation studies of the BATSE database are also numerous [41], in addition to 
Fermi GBM [42]. A study of Konus-Wind GRBs searching for time delays of ~ hours to ~ 25 years (lens masses of 
10° — 10'3 Mọ) was further augmented by the inclusion of spectral analysis [30]. Another lens identification technique 
involves correlation of the cumulative light curve in three spectral dimensions [43]. Correlation-free approaches include a 
model-agnostic statistical method, which does take into account Poisson statistics [8], and Fourier analysis methods [4]. 
For a comprehensive review of the gravitational lensing of transient events, see [44 [80]. 


Candidate selection. The BATSE catalogue contains 2,704 triggered y-ray bursts. Of these, 2,629 have discsc 
bfits (64ms) light curves available for download. Furthermore, higher-time-resolution observations are available 
for 2,446 of these y-ray bursts, with 2,435 existing as pre-binned tte bfits (5ms) light curves. We carry out a 
preliminary auto-correlation search (described below) on both the discsc bfits and tte bfits pre-binned light 
curves. Candidate detections are followed up with further analysis. In total we carry out auto-correlation on 2,679 
unique y-ray bursts. 

Signal (auto)-correlation can be used to measure the time delay of temporally overlapping signals of a gravitationally 
lensed system [24]. We define the autocorrelation function (ACF) as 


Zj=o L(t; + St)I (ti) 
N 
r-o L (th)? 
where the sum in the numerator is taken over the bins where the two signals overlap, and the sum in the denominator 
is taken over the entire input signal [24 [7]. Here, I(t;) is the count rate at time bin tj, N is the total number of bins, 


and n the total number of overlapping bin, where j, k index these bins in the summations. 
We fit a 3°¢ order Savitzky-Golay filter F (ôt) to the ACF. The dispersion ø between the ACF and the fit F'(6t) is 


C(ét) = ; (6) 


1 N 
pan D [C (8t) — Fs)’, (7) 


j=0 


where N is the total number of bins. We identify 30 outliers as gravitational-lensing candidates [7] [6]. Furthermore, 
we autocorrelate each of the four BATSE LAD energy channels, and perform the same filtering process. Gravitational 
lensing is achromatic for point sources, so we expect that each channel of a candidate lens GRB should autocorrelate 
with the same time delay. We check that candidates yield lensing signals in both the summed light curve and individual 
energy channels. 


Photon counting. Photon counting is a Poisson process. High-energy satellites like BATSE accumulate photons at 
a series of discrete times, t1, t2, ...t,. For BATSE, these photons are collected with a sampling frequency of 500 kHz. In 
most cases, hardware limitations require that the BATSE photon arrival times (time-tagged events; TTE) are down- 
sampled before transmission to Earth. Only the shortest, moderately bright bursts are completely contained within 
the TTE photon-list data. Fortunately, this is the case for GRB 950830. For bursts not completely encoded in a TTE 
list, the counts are averaged into 64ms bins before transmission, typically recorded for ~ 240s after triggering. We 
ignore the fact that BATSE has a small dead time (of about one clock cycle) after each count as this is only becomes 
import for very bright bursts which saturate the detector. 

If we consider a single time stamp t;, the likelihood of observing N; photons is given by Poisson counting statistics 


Aie tes 
L(N;|?) = WNI (8) 
The expected number of photons, Xg is 
ào i = Ot; R(t:0). (9) 


Here ôt; is the sampling time and R(t;|0) is the photon rate (in units of photons per unit time) evaluated at time t; 
and given model parameters 0. The sampling time, ôt;, is subscripted with index 7 to account for the cases where the 
time resolution in the available data changes during an event. The rate can be written as a sum of signal S (from the 
GRB) and background B 


R(t;|0) = B + S(t;|9). (10) 


To first order, the background is constant, but the signal varies with time according to model parameters @. 
The likelihood of observing N = Nj, No,... photons at times t1, t2, .... is given simply by taking the product of the 
likelihood functions evaluated at different times 


LINIO) = J Li /9). (11) 


It is easier, though, to work with the ln likelihood, which is 


In £(N|0) =X In L(N;10) A 
z 5 N; ln (tB + ötiS(t:10)) 


z (01.8 Es ötiS(t:10)) —log(N;!). (13) 


Bayesian Inference. There are two goals of Bayesian inference. The first is to derive posterior distributions p(6|d) 
for our model parameters, which enables us to determine their credible intervals. The second goal is to calculate the 
Bayesian evidence Z for a set of models in order to do model selection. Bayes theorem, 


L(d\0)n(9) 
plola) = SEO, 
relates the posterior probability density p(0|d) of model parameters 0 given the observed data d, to a likelihood function 
L(d|0) and prior probability density 7(0). The likelihood function is a mathematical description of the probability 
of observing the data with the given model parameters. The priors are probability distributions for what we expect 
these parameters to be, which, in our case, are informed by the BATSE GRB population. The evidence, also called 
the marginal likelihood, is a normalisation factor which gives information about the quality of the fit of the model to 
the data averaged over parameter space, viz. 


(14) 


Z= | £(ajo)n(o)ab. (15) 
We define different models for 
S(til0, M), (16) 


which we use to do Bayesian inference. The null model M = My states that there is no lensing. The lens model 
M = Mens states that there is lensing. We adopt the Fast-Rise Exponential-Decay (FRED) pulse model, which is 
ubiquitous in GRB pulse-modelling: 


t-A T 

S(t|A,7,€, A) = Aexp |—€ + : (17) 
T t—A 

Here, A is a vertical y-scale factor, 7 is a duration scaling parameter, and € is an asymmetry parameter, which can 

be used to adjust the skewness of the pulse. A more generalised form of this model has additional exponents, y, v, 

allowing for flatter/sharper peaks, viz. 


= Y T sd 
S(t|A,7,€,A,7,v) = Aep | El (t = E (=x) | : (18) 


We call this the extended FRED model, or FRED-X for short. An analytic normalisation exists for both the FRED 
and FRED-X models, which decouples the maximum height of the pulses from every parameter except A, such that 
A is the maximum amplitude of the pulse. Structured pulses can be modelled as either multiple overlapping pulses, 
or by accounting for the residual structure with another parameterisation. Thus, a single channel FRED light curve 
requires 4n + 1 parameters, where the +1 corresponds to the constant background parameter B. For bursts with many 
pulses, our model is: 


Stot = 5 S(t|A;, Aj, Tj, &). (19) 
J 


Our prior enforces 
Aj41 > Aj. (20) 


to ensure that we are not fitting the same pulse configuration in different permutations. 

The FRED model has proved a popular phenomenological fit due to its simplicity and a posteriori applicability to 
certain GRB progenitor models. Some authors have noted (and we confirm here) that there is a systematic residual 
structure, visible after subtracting the best-fit FRED and FRED-X models from a GRB light curve, indicating an 
imperfect fit [45]. We model these residuals using a sine-Gaussian wave packet 


t— Ares 5 
Tres 


as it is ubiquitous in physics and provides an adequate fit to the residual structure. The residual is part of our signal 
model, so we fit it simultaneously to any FRED pulses: 


res(t) = Ares exp cos (wt + p), (21) 


S(t|0) = S(t|Orrep) + res(t|Ores). (22) 
The lens model is similar to the null model, but with two extra parameters used to describe the delayed signal: 
Sens (flens) = S(€|Onun) + rmt i S(t + At|Onun) (23) 


where the lens parameter vector Ojens subsumes the null parameter vector Onun in addition to the time delay, At, and 
magnification ratio, r. The parameter r reduces the amplitude of the delayed signal while the parameter At describes 
the size of the delay. Thus, the lens model requires 4n + 3 parameters, where Niens is typically about half nnui. 

In order to determine which model is favoured, we calculate the Bayesian evidence for each model: 


Znull = J daul Loan (Ñ, Onun) (Onun), (24) 
Zlens = / dôiens Liens (N, Mens) ™(Oiens)- (25) 


Here m denotes a prior distribution and N is the vector of photon counts. Once we have each evidence, we obtain the 
In Bayes factor 


In(BF) =In Zuli —In Zigis: (26) 


The Bayes factor is a statistically rigorous measure of which model the data prefer. A ln Bayes factor of eight, 
In(BF) = 8, is considered “strong evidence” in support of one model over the other [26]. 

To perform parameter estimation and evidence calculations we use the Bilby Bayesian inference library [46]. We 
employ nested sampling [47] [48], taking advantage of the multi-ellipsoid bounding method [49], with dynamically 
updated sampling points [50]. Results tend to be unimodal, but we use multi-ellipsoid bounding regardless due to its 
flexibility and speed in the case of multimodal results. Some parameters recover bi-modal distributions, particularly 
in pulse start times Aj, due to the pre-binning of the BFITS datatype analysed. 


Priors. The priors are tabulated in Table[I] We use log uniform priors on parameters, which typically vary by orders 
of magnitude and uniform priors for other parameters. The prior ranges are informed by the BATSE GRB population. 
We fit models to a large selection of isolated, individual pulses to develop intuition for the practical bounds of the 
priors. A more sophisticated analysis would employ hierarchical inference to infer the shape of the prior distributions, 
but we leave this for future work. The priors for quantities related to time (A, Ares, At) are taken to be uniform. The 
prior range depends on the particular trigger being investigated, since the GRB population is (bimodal) log-normal in 
duration over many orders of magnitude. Thus the priors must be chosen appropriately for a region of time that will 
contain all the pulses. Care is taken with the time delay prior to ensure that the extra pulses due to lensing are not 
occurring outside the light curve under investigation. Additionally, the prior forces the second pulse to occur after the 
first pulse. 


parameter | minimum | maximum | type units 

A; T f | uniform seconds 
Ai A; uniform seconds 

B 107! 10° | log-uniform | counts / bin 
A 10° 10° | log-uniform | counts / bin 
T 10-3 10° | log-uniform | seconds 

E 1078 108 | log-uniform | — 

y 1078 108 | log-uniform | — 

v 1078 10° | log-uniform | — 

Ares uniform seconds 
Ae 10° 10° | log-uniform | counts / bin 
Tres 1078 10° | log-uniform | counts / bin 
w 1078 10° | log-uniform | — 

p =T m | uniform radians 


Table 1: Priors used in this analysis. Cells marked with f indicate that the prior bounds are chosen differently for 
each burst. Typically they are the start and end bins of the light curve. The prior for Ares has the same prior bounds 
as A. See the methods paragraph Priors for more information. 


Results & Interpretations. For GRB 950830, we find that lensing is strongly preferred over two-pulse models. 
The two pulses in Fig. [iare so alike, that the data prefer a fit with a single set of pulse-parameters. Thus, a single pulse 
seen twice with delay time At, and reduced in brightness by some scaling factor r is the preferred model compared to 
a more complex model with two completely independent pulses. We interpret this result as evidence that GRB 950830 
was strongly lensed. 

A lensed FRED-X model with a sine-Gaussian residual is the preferred model; (see Eqs. (18), for the signal 
model). There is only a single pulse, which is later repeated, so j € {1}. The fits to the light curve for each channel 
are shown in Extended Data Figs. (1-4). The median fits of the FRED-X pulses and sine-Gaussian pulses are plotted 
individually, and the sum of both is shown in the third panel of each figure. There are also 100 individual posterior 
draws for each model to show the breadth and multi-modality of the fits. Compared to the corresponding two-pulse 
model, the lens model is favoured by In(BF) = 12.9. We find that other lens models are similarly favoured. No 
matter which pulse model we implement, the lensing hypothesis is always favoured. The simplest model comparison 
comparing a single lensed FRED pulse to a null model with two independent FRED pulses, has the largest Bayes 
factor with In(BF) = 24.5. 

A model with more parameters is naturally penalised in Bayesian model selection by virtue of the larger region 
of prior volume explored when calculating the Bayesian evidence [26]. We therefore ask: is the lens model favoured 
because an additional pulse simply adds such a great volume to our prior space? We investigate the effect of prior 
volume on the model selection to assure ourselves that we have not arrived at a spurious result. The extra parameters 
in the FRED-X model are y and v. We test priors on y and v in the ranges, (10~',10'), (107, 107), (107%, 10°) in 
both uniform and log-uniform spaces. We also include a narrow Gaussian prior centred on the values found in previous 
analysis (typically between 1/4 — 4). In all, we study seven different prior volumes for two models in each of the four 
channels. The effect on the resultant model selection is minimal. Looking at the corresponding pairs of null and lens 
models which have the same priors, we find the In(BF) changes by ~ 1 — 2. 

Channel 3 (green) exhibits the greatest variation in its Bayes factor, likely due to having the highest signal-to-noise 
ratio. For channel 4 (blue), the inclusion of a sine-Gaussian residual makes the two-pulse (null) model marginally 
preferred (In(BF) ~ 1) over a lens model. The In(BF)s in favour of lensing are ~ 2 — 5 per channel, depending on 
the prior. The total Bayes factor quoted in the main body of the paper assumes log-uniform priors for y and v on 
(10-',10'). In any case, the two parameters used to infer the lens mass are largely independent of the model or 
choice of priors. We find that our result is independent of the nested sampling package, eg. Dynesty [5I], Nestle: 
used to run the analysis. 

As a sanity check, we apply our analysis not only to the pre-binned tte_bfits data, but we also take the photon 
arrival time data (tte_list) and run the analysis again on the counts incident at each triggered detector. There are 
three: detectors 5, 6, and 7. We bin the count data to 0.005 ms bins to match the pre-binned light curve and repeat 
our analysis. We find that there is no change to the model section. While individual Bayesian evidence factors and 
therefore model comparison Bayes factors fluctuate when considering different data, the lensing model is consistently 
preferred in each channel for each triggered detector. 

We also analyse the hardness of GRB 950830. The hardness-duration of GRB 950830 and the rest of the BATSE 
GRB population with published T90’s is shown in Extended Data Fig. 5. The hardness H32 is defined as background 
counts in channel 3 (110-320 keV, green) to the counts in channel 2 (60-110 keV, yellow). We estimate the background 
by taking the mean of the bins outside the trigger region of the burst light curve. We find that the hardness of Pulse A 
(2.09+0.10) and Pulse B (1.8340.11) of GRB 950830 agree within statistical errors. We expect two lensed pulses to 
exhibit the same hardness since lensing is achromatic for point sources. The slightly different duration between the 
two pulses (160ms, 130ms) is due to the lower amplitude of the second pulse (Pulse B). We apply a two-component 
Gaussian mixture model to segregate long and short y-ray bursts. The duration and hardness of GRB 950830 are 


typical of a short gamma-ray burst. We have include the autocorrelation of the light curve of GRB 950830 in Extended 
Data Fig. 6. 

To improve our analysis, we could a priori constrain the magnification ratio and time delay to be the same 
parameter in each of the spectral channels. The eight parameters in the lens model — two from each channel — are 
reduced to just two shared between the four energy channels. However, analysing the channels separately provides an 
independent check that the magnification ratios and time delays are consistent, in accordance with the equivalence 
principle (cf. Fig. 3, Extended Data Fig. 8). Below, we revisit this topic, analysing the data with multiple channels 
simultaneously. Doing this four-channel nested sampling analysis on the FRED-X model with a sine-Gaussian residual 
becomes prohibitively expensive. Since we believe that this will only increase the Bayesian evidence in favour of 
lensing, and since our result is already quite strong (In(BF) = 12.9), we leave this analysis for future work. 

Furthermore, we could include a spectral model to relate the pulse-fits in the four channels, reducing the number of 
free parameters in our model. The canonical GRB spectra model — the Band function — is a time averaged spectra [52], 
which would not suit our purposes of a time-evolving spectra. Addition of a spectral model requires the analysis of 
four time-series simultaneously, which is computationally challenging. We leave this as a goal for future work. 

Another future goal is to use hierarchical inference to infer the prior distributions for FRED-X parameters using 
the full catalogue of GRBs, the vast majority of which do not contain a lens. This would yield priors consistent with 
the population properties of GRBs. Again, we expect this to only strengthen the evidence in favour of lensing. Finally, 
we do not fully utilise the available high-resolution (TTE-list data), since this requires analysis of an 800,000 unit 
time series. This analysis is prohibitively expensive for the many-parameter models. We are able to run a simple 
FRED model Eq. (17), and found that lensing was similarly favoured in all channels when running the analysis with 
the pre-binned BFITS data. 

We have thus-far assumed that we expect each image of a gravitationally lensed GRB will be statistically consistent. 
We have not discussed the many potential causes for anisotropy between the images. Gamma-ray bursts are highly 
beamed due to the ultra-relativistic velocities (y ~ 10374) of their progenitor outflow. This results in the viewing 
angle onto the emission surface becoming important, since the radiation is beamed within and angle of 0 ~ y~!. 
The difference in viewing angle onto the source scales with the mass of the lens — the more massive the lens, the 
stronger the deflection, the greater the original angular separation of the lines-of-sight onto the source. Assuming 
a homogeneous emission surface, both lines-of-sight onto the source should be viewing the same region for masses 
Mens < 10’? Mo. For larger masses, the deflection angle becomes great enough such that the observer is viewing 
two emission regions which may not be in casual contact, thus the gravitationally lensed images need not be identical. 
For smaller lens masses, anisotropy in the GRB emission surface can result in the images having inherent differences. 
Finally, as discussed earlier, the detector orientation and energy response can have a significant effect on the inferred 
energy spectrum, potentially resulting in a false negative identification of a gravitationally lensed pair of y-ray bursts. 


Method limitations. Our method provides advantages over model-agnostic approaches such as correlation, which 
do not include all available information, for example, Poisson counting statistics. Bayesian inference provides a natural 
framework to make quantitative statistical statements about preferred models. Our methodology provides a metric for 
detection significance, and successfully rejects dubious candidates, which trigger an autocorrelation detection (see A 
rejected candidate). Of course, the results of Bayesian inference are only as good as the choice of model and priors. 
We try a variety of pulse models and prior ranges to ensure our results are robust, and find that the statistically 
identical pulse model is consistently preferred. This does not preclude the existence of a better pulse model. We 
have shown that the gravitational lens candidate GRB 950830 is robustly detected using both traditional GRB lensing 
techniques and our Bayesian inference method. 

Finally, we point out that any method relying on self similarity can produce false positive lensing candidates 
if identical repeating pulses are a feature of some y-ray bursts. However, we regard the “intrinsic self-similarity” 
hypothesis as unlikely since the vast majority of GRBs are not seen to repeat, and we cannot think of a physical 
mechanism that would cause a subpopulation of short GRBs to emit identical pulses. 


Estimate of Optical Depth. A rough estimate for the optical depth to strong gravitational lensing is 


= Niens 
= $ 
Nears 


(27) 


where Niens is the number of multiply-imaged GRBs and Nars is the total number of GRBs in our dataset. We find 
Mens = 1 lensed GRB in a dataset of Narg = 2,679, so the lens probability is P(r) ~ T = grs x 1074 (90% 
credibility), where we have used a Jeffreys prior. We may relate the energy density of lenses to the optical depth, 
Qı ~ T((zs)) [L7], where (zs) is the mean redshift of sources in the sample. 

For a point mass and a point lens, the angular Einstein radius of the lens is 


62 _ 4GM, da(Z1, Zs) 
# ce da(z)da(zs) 


Here, G is the gravitational constant, c the speed of light, and M; the mass of the gravitational lens. The angular 
diameter distances are defined as d4(z1, zs) = (x(zs) — x(z1)) /(1+ zs), with proper comoving distance 


(28) 


( ) c dz 
Zi Za) = . 
ea Ho zı VAG F O41 + z)’ 


We take Q, = 0.714, and Qm = 0.286, with present day Hubble constant Hy = 69.6kms~!Mpc7!. The angular 
impact parameter ( of the true position of the source to the lens can be parameterised in units of the Einstein radius, 
y = 8/0m. Such a configuration creates two images, with time delay given by 


(29) 


At(Mi, 21,9) = (1+ 21) ER F), (30) 


where 


fu) = (gvir EHE) (31) 


4—y 


A source with angular impact parameter 8 has an effective lensing cross-section of 


/ J= | om 9 BAB 


min 


Ymax 
= 162, I 2ydy 
y 


min 


4nGM A vee 
= TG l da(Zi, 2s) I 2y dy. 


c2 da(z1)da(Zs) y (32) 


min 


Thus, Ymin and Ymax turn the cross-section into an annulus. The minimum impact parameter is set by the time delay 
between the arrival times of the two images. If the time delay is too short, the images will appear as single gamma-ray 
burst. For a point lens, we may calculate the minimum time delay for a lens of mass M; at redshift z by inverting 


equation (30), 
c Atmin ) 


(1 + 2)4GM, (33) 


Ymin(Atmin; Mı, zı) = f ( 
since f(y) is monotonic increasing in y. We take Atmin = 10 ms. The latter-arriving image is dimmer than the first 
image but must still be above the detectable flux for the detector, p2peak > Yo, Where uo is the magnification of the 
dimmer image. This restricts the maximum possible impact parameter [23]: 


j 1/4 —1/4 
Ymax(Ppeak) = (1 +4 pes: ) (1 + Fees: ) , (34) 
Po Po 


where Ypeak/Yo is the peak counts divided by the trigger threshold at that time. We estimate ymax with medi- 
ans of the Ypeak/Yo for the peak flux on 64ms, 256ms, and 1024ms integrations from the BATSE Cmax table: 


https: //gammaray .nsstc.nasa.gov/batse/grb/catalog/4b/tables/4br_grossc.cmaxmin, which are 1.5, 2.2, and 


2.5 respectively. The final cross section is then 


— 4nGM, da(zi)da(%, Zs) 
7 Ce da(zs) 


a(#) cae i) (S) (Ymax a Ymin) (35) 


where 7 = (Mj, 21, Zs, Ppeak: Yo, Atmin) and © is the Heaviside step function. 
The optical depth is the number density n(z;) of lenses at redshift z;, multiplied by the effective lensing cross-section 
of each lens o(£), integrated over z € (0, zs): 


a= L f eme | oe (36) 


We assume a constant comoving density of lenses, 
m(z) = no(1 + z2). (37) 


The number density of lenses can be related to their energy density Q; through 


2 HEQ 
E ERa (38) 

Mı 8rGM, 
With comoving volume element 

d 

dV (z) = xX? (2) x) dzdQ 
Z 
c dz dQ 
= x"(z) (39) 


Ho SAn + Om( + 2)3" 
we have 


— 8AM f” 3 (1 + 21)x(21) ‘ f 
= ee Tin + Om(1 + a8 Ix(zs) — x(z)] 


` [YZax(Ppeak; po) = Gin (Atmin, Mi, z1)] i (40) 


With an estimated lens probability of P(T) ~ T ~ 3.7 x 1074, we infer the lens density by inversion of Eq. (40). The 
result of this integral is shown in Extended Data Fig. 9 for several mean source redshifts (z,) each with the three 
permutations of Ypeak/Yo from the BATSE Cmax table. 

The 7 BATSE bursts with known redshifts are GRB 970508: z = 0.835, GRB 970828: z = 0.958, GRB 971214: 
z = 3.412, GRB 980425: z = 0.0085, GRB 980703: z = 0.967, GRB 990123: z = 1.600, GRB 990510: z = 1.619, with 
mean (z) = 1.34. The average spectroscopic redshift of Swift y-ray bursts is (z) = 2.2 [53]. We are unable to accurately 
estimate the redshift of GRB 950830 or the BATSE catalogue in general due to the inherent degeneracy between the 
effects of cosmological redshift and relativistic beaming on a y-ray burst light curve. We argue that a mean BATSE 
GRB redshift of (zs) ~ 2 is appropriate based on the redshifts of known BATSE bursts and the spectroscopically 
determined redshifts of other GRB catalogues. We include the derived energy densities for a number of redshifts in 
Extended Data Fig. 10 for the comparison. The lens densities are calculated from the source redshifts and optical 
depth through Eq. (40). With the inferred lens densities Q, we may exclude our calculated globular cluster density 
Ngo ~ 8 x 10~® based on Poisson statistics. Observing one gravitational lens in ~ 2,679 light curves is very unlikely 
for such a low cosmological density. 

The present day number density is given by 


T(z 


Qimea (2s) Pe 
Nimen (21, Zs) T Pait, (41) 
IMBH 


where pe is the critical energy density of the Universe and Miypu(z1) is the mass of the lens. With Oimn((Zs) ~ 
2) = 4.6733 x 1074, and (1 + 2:)Maex = 5.5754 x 10° Mo => Momalz ~ 1) ~ 2.855 x 10* Mo yields 
Tome = 23. x 10° Mpc™? through Eq. (38). Where z; ~ 1 comes from a gravitational lens being most likely to 
occur halfway between the observer and source. Uncertainty on the density of IMBHs arises from the fact that Mens 
is Poisson distributed. We calculate the uncertainty on n; assuming z; = 0, as we do not know where the lens is, only 
that its most probable redshift is z ~ 1. We ignore the uncertainty due to the lens mass as it is much more precisely 
determined. In order to calculate the uncertainty on Nimen, we assume that the likelihood of the data given niysx 
follows an N = 1 Poisson distribution. Employing a Jeffreys’ prior, 


7 (Ten X name), (42) 


we obtain a 90% credible interval of nimes = 0.7 — 7.2 x 10? Mpc~?. The calculated lens densities for each redshift are 
summarised in Table 2] 


(zs) OQi(Zs) (21, Zs) Qe. exclusion 
0.1 3.555% x 10-1 7.975% x 10° 99.99999% 
0.5 1 9723 x107? 3. gti x 104 99.9988% 

10 2462 x 1078 G1 x 103 99.988% 

1.34 1.250 x 1073 5.1t3%0 x103 99.97% 

2.0 4.638 x 1074 2.3118 x103 99.85% 

BO Als x 1075 3.6tb% x10? 95.05% 


Table 2: The inferred lens densities Q; for mean source redshift z,. A median peak counts ratio C = 2.5 from 
the BATSE Cmax/Cmin table for 1,024ms integration times is assumed. The peak count ratios are defined through 
C = Cmax/Cmin. Cmax is the maximum detected counts over a given integration period. Cmin is the minimum number 
of counts that would trigger the second most brightly illuminated detector at that time. Further details are given in 
the Supplementary Section calculation of optical depths. 


Magnification bias. It is possible that magnification bias may affect the estimate of the expected probability of 
lensing events. The possibility of magnification bias is discussed in [23], but the details are updated here from [54]. 
In order for magnification bias to be greater than a few percent, the cumulative number counts dN of the physical 
parameter P by which the events are detected needs to have a > 2 where dN x P~“. In the case of GRBs, P is the 
peak flux over the trigger energy range. For BATSE, the number counts as a function of peak flux are given in Fig. 
6 [54]. Estimating the value of a from these plots gives a ~ 1.1. The trigger flux for GRB 950830 falls near the faint 
end of the distribution at 2.81 + 0.12 y cm~? sec~! in the 50 — 300 keV energy range over integration times of 64 ms, 
256 ms and 1024 ms. 


The lens is unlikely to be a globular cluster. Let us assume that all globular clusters have the same mass as the 
discovered deflector. The globular cluster mass function has a turnover at ~ 2 x 10° Mo , so this approximation is 
not necessarily a bad one, especially given the uncertainty in the inferred lens mass. The Milky Way formed from an 
overdensity on a scale of approximately 20 Mpc?. If the 200 globular clusters in the Milky Way is an average number 
count for a given cosmic volume, then the number density of globular clusters is approximately n ~ 10 Mpc~?. Thus 
with n ~ 10 Mpc™?, M ~ 10° Mo: 
8rG 
Qec = 3H?’ p=nM 
~ 8x 107° 


This is too low to be consistent with our inferred value of Q ~ 4x 1074, suggesting that the lens is not a globular cluster. 
Assuming a mean source redshift of (z,) ~ 2 in the BATSE catalogue, we exclude a globular cluster lens at 99.85% 
credibility. See Extended Data Fig. 10 for exclusion credibilities for different mean redshifts. The mean of BATSE 
gamma-ray bursts with known redshift (z) = 1.34 gives an exclusion credibility of 99.97%, with Qı = 1.43 x 1078. 


Combining data from different channels. In the analysis presented above we analyse each channel independently. 
This enables us to carry out posterior checks (see Fig. |2) while controlling computational costs. However, in order 
to estimate the significance of the lensing signal, we combine the results from each channel together, increasing the 
resolving power to obtain a single Bayes factor for the lensing versus no-lensing hypotheses. There are several steps. 
First, we take the posterior samples used to calculate the credible intervals for delay time At and magnification ratio 
r in Fig. [2] and use a kernel density estimator (KDE) to obtain an analytic description of the posterior distribution for 
each channel i: 


p(At, r|di). (43) 


This step is necessary in order to obtain smooth functions that can be multiplied together. Next, we invoke Bayes’ 
theorem to convert the KDEs to likelihoods distributions for each channel: 


L(d;|At,r) = p(At, r|d;). (44) 


Zi 
n(At,r)" 
If the two pulses are created by a lens, then the delay time At and magnification ratio r are the same for each channel. 
Thus, the total evidence for the lensing hypothesis is given by the product of likelihoods for each data channel, 
marginalised over (At, r): 


Ge = I dAt I aA A (45) 
=| dAt fom dr x(At,r) II aes J” ps(Atsrld)) . (46) 
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On the other hand, if the two pulses are not created from a lens, there is no reason for the delay time and magnification 
ratio to be the same for each channel, and so the total evidence for the null hypothesis is given simply by the product 
of the evidence values for each channel: 


Zhai = II Znull, i (47) 


Evaluating Eqs. [45] and [47] we obtain a total Bayes factor In(BF) = 12.93 in favour of the lensing hypothesis. 


False alarm probability. In Bayesian statistics, we carry out model selection using the posterior odds [26], 


Z ens ens 
glens _ Al TI (48) 


null 7 , 
Znull Tnull 


where Ziens/Znui = BF is the Bayes factor from Eq. (26). The prior odds, Tens/Tnul, expresses the relative probability 
assigned a priori to the hypothesis of lensing over a null model. If we assign an equal prior probability to both 
hypotheses, then 


Ziens 
glens = 49 
null Zi ( ) 


Plens 
= 50 
I= Plens ( ) 


and so the false alarm probability is 


1 
1+ BF 

1 
1 + e12.9 
= 2.5 x 107° 


1— Plens = 


A more conservative approach is to assign prior odds equal to the reciprocal of the total number of bursts searched 
(or equivalently, equal to the optical depth), which gives us: 


Ziens Tens 


Qiens = 


null 


Znull Tull 
BF 


2,679’ 


which gives a false alarm probability of 


1 
1+ BF/2,679 
1 
1 + e12-9/2, 679 

= 6.5 x 107%. 


1- Plens = 


We see that the strong preference for the lensing hypothesis persists even taking into account trial factors. 


A rejected candidate. For illustrative purposes, we also include one gravitational lensing candidate identified by 
the autocorrelation algorithm, which our Bayesian framework strongly rejects. The light curve for GRB 911031 is 
shown in Extended Data Fig. 7 as the sum of the four BATSE large area detector broadband energy channels and 
separately for each channel in the first and third panels respectively. Each colour indicates a different energy channel, 
red: 20 — 60 keV, yellow: 60 — 110 keV, green: 110 — 320 keV, blue: 320 — 2,000 keV. The light curve has the sort of 
repeating pulse structure which one might naively mistake for lensing. The time delay is even similar in each energy 
channel. The second and fourth panels show the correlogram (autocorrelation) of the summed and spectral light curves 
respectively. 

The results of autocorrelation indicate very strongly that there is some similar structure in the two pulses. However, 
the pulse modelling prefers a two-pulse model in every channel because the two pulses are not precise duplicates. Not 
only are the pulse shapes (7, €) different between the two pulses, but the time-delays and magnification ratios (At, r) 
inferred through parameter estimation are inconsistent. Extended Data Fig. 8 shows the gravitational lens parameter 
posterior distributions for lens model fit to GRB 911031. Gravitational lensing for point sources is achromatic, so 
the time-delay and magnification ratio posteriors for each channel should be overlapping for a gravitational lensing 
candidate. The clear correlation between energy, magnification, and time delay immediately suggests that these are 
two independent pulses of a GRB and not a gravitational lensing echo. The In(BF) in favour of a two-pulse model 
are 229.0, 301.7, 374.0 15.4 for channels 1, 2, 3, and 4 respectively. We can therefore firmly rule out the lensing 
interpretation for this burst. 


Data Availability 


The BATSE data catalogue is available from the NASA data archive: 
We use the discsc, tte, and tte_list datatypes in our search. The data used in our analy- 
sis of GRB 950830 is found at 
03770 burst/tte_bfits_37/0.its.ga|uttps://heasarc.gsic.nasa. gov/FTP/compton/data/batse/trigger /03 
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Code Availability 


The analysis code PyGRB [56] has been written in python [57] by J.P. and is freely available at https://github. 
com/JamesPaynter/PyGRB under the BSD 3-Clause License. PyGRB is built around Monash University’s Bilby nested 


sampling wrapper, with additional FITS I/O functionality provided by AstroPy [58]. The software uses the NumPy 
and SciPy computational libraries. Plotting makes use of the Matplotlib [61] and corner [62] libraries. 
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Figure 1: The gravitationally lensed y-ray Burst, BATSE trigger 3770 — GRB 950830. a) The light curve 
is the pre-binned 5ms tte BFITS data. Each colour indicates a different energy channel, red: 20 — 60 keV, yellow: 
60 — 110 keV, green: 110 — 320 keV, blue: 320 — 2,000 keV. The coloured shaded regions are the 1-ø statistical errors 
of the y-ray count data. The solid black curves are the posterior predictive curves, the mean of > 60,000 fits. As 
they are the mean of = 60,000 fits, the burst and echo image-fits need not be the same, even though the individual 
fits which make up the mean are identical. Over-plotted in fainter black are curves of 100 random parameter draws 
from the = 60,000 total posterior sample sets. b-e) The lower panels are the difference between the true light curve 
and the posterior predictive curve. The shaded regions have been transformed in the same fashion. Note that 68% of 
the peaks and troughs in the residual should lie within the corresponding shaded region for a good fit. The individual 
pulses that make up the combined fits are shown in Extended Data Figs. 1, 2, 3, and 4 for channels 1, 2, 3, and 4 
respectively. 
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Figure 2: Marginalised posterior distribution of time-delays vs magnification ratios for the gravitationally 
lensed y-ray burst GRB 950830. The colours indicate the following energy channels, red: 20 — 60 keV, yellow: 
60—110 keV, green: 110—320 keV, blue: 320—2, 000 keV. The black region is the result of the product of Gaussian kernel 
density estimates for each of the four channels, resulting in tighter constraints on the time delay and magnification 
ratio. The contours contain 39.3%. 86.4%, and 98.9% of the probability density. 
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Figure 3: The redshifted lens mass. Taking the time delays and magnification ratios posterior samples shown 
in Figure (2), the mass of the gravitational lens may be determined through equation (i). The colours indicate the 
mass recovered from the following energy channels, red: 20 — 60 keV, yellow: 60 — 110 keV, green: 110 — 320 keV, 
blue: 320 — 2,000 keV. The black histogram is the mass inferred from the Gaussian kernel density estimate of the four 
energy channels (see Figure (2}). The median of the mass is (1 + 2)M, = 5.579 x 104 Mo (90% credibility). The 
inferred mass is dependent on the redshift of the lens itself. Without any knowledge of the lens or source redshift, all 
we are able to say is 0 < 2 < zs, and so Mi < (1+ 2)M.. 
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Figure 4: The individual pulses that make up channel 1 (red: 20 — 60 keV) of Figure a) The solid red 
curves are the median of ~ 60,000 FRED-X pulses sampled from the posterior distributions. Two hundred of these 
curves are sampled and shown in black. b) The same as a) for the sine-Gaussian residual. c) The sum of the medians 


of the pulses in a) and b). 
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Figure 5: The individual pulses that make up channel 2 (yellow: 60 — 110 keV) of Figure a) The solid 
yellow curves are the median of ~ 60,000 FRED-X pulses sampled from the posterior distributions. 200 of these curves 
are sampled and shown in black. b) The same as a) for the sine-Gaussian residual. c) The sum of the medians of the 


pulses in a) and b). 
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Figure 6: The individual pulses that make up channel 3 (green: 110 — 320 keV) of Figure [1| a) The solid 
green curves are the median of ~ 60,000 FRED-X pulses sampled from the posterior distributions. 200 of these curves 
are sampled and shown in black. b) The same as a) for the sine-Gaussian residual. c) The sum of the medians of the 
pulses in a) and b). 
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Figure 7: The individual pulses that make up channel 4 (blue: 320 -— 2,000 keV) of Figure [1] a) The solid 
blue curves are the median of ~ 60,000 FRED-X pulses sampled from the posterior distributions. 200 of these curves 
are sampled and shown in black. b) The same as a) for the sine-Gaussian residual. c) The sum of the medians of the 
pulses in a) and b). 
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Figure 8: The Hardness-Duration plot of BATSE GRBs. The T90 durations are taken from the BATSE data 


tables: https ://gammaray.nsstc.nasa.gov/batse/grb/catalog/4b/index.html, We calculate the hardness ratios 


for each of the GRBs with a listed T90. The short y-ray burst population is shown in purple, and the long GRB 
population in red. Iso-likelihood contours of a two-component Gaussian mixture model are plotted in grey. The plotted 
uncertainties in the hardness ratio are defined by 1-ø statistical errors on the number of counts in the numerator and 
denominator. 
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Figure 9: The autocorrelation of the light curve of GRB 950830. a) The sum of the four energy channels, 
~ 20 — 2,000 keV. b) The autocorrelation function of the summed light curve, where the autocorrelation is defined in 
equation (6). The black dotted line is a fit to the light curve with a 3rd order Savitzky-Golay smoothing filter with a 
101 bin smoothing window. The vertical red dotted line is the point of maximum deviation between the ACF and the 
Savitzky-Golay smoothing filter at dt = 0.390 seconds. The blue shaded regions delineate regions of 1-0, 3-0, and 5-0 
away from the Savitzky-Golay fit. The dispersion between the autocorrelation function and the fit, o?, is defined in 
equation (7). c) The autocorrelation function for each of the 4 BATSE large area detector broadband energy channels. 
Each colour indicates a different energy channel, red: 20 — 60 keV, yellow: 60 — 110 keV, green: 110 — 320 keV, blue: 
320 — 2,000 keV. The shaded regions delineate 3 — ø deviance from the Savitzky-Golay fits, which are omitted for 
clarity. 
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Figure 10: Optical depth as a function of source redshift z,. We estimate the optical depth for mean source 
redshifts zs = 0.1: blue, zs = 0.5: orange, z, = 1.0: green, z, = 1.34: black, zę = 2.0: red, zs = 5.0: purple based on 
Eq. (40). The median Cymax /Cmin values of 1.5, 2.2, and 2.5 taken as the magnification limit cutoff (Eq. (34)) are shown 
as solid, dash-dot, and dashed curves respectively. The solid black horizontal line is the estimate lens probability based 
on seeing one event in ~ 2,700 light curves. The dotted black vertical line is the estimated globular cluster density, 
Qcc. The dash dot vertical black line is the naive estimate for the density Qiens ~ T. The calculated lens densities for 
each redshift are summarised in Table [2] 
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Figure 11: The autocorrelation of the light curve of GRB 911031. a) The sum of the four energy channels, 
~ 20 — 2,000 MeV. b) The autocorrelation function of the summed light curve, where the autocorrelation is defined 
in equation (6). The dotted line is a fit to the light curve with a 3rd order Savitzky-Golay smoothing filter with a 101 
bin smoothing window. The blue shaded regions delineate regions of 1-o, 3-0, and 5-o away from the Savitzky-Golay 
fit. The dispersion between the autocorrelation function and the fit, o?, is defined in equation (7). c) The light curve 
for each of the 4 BATSE large area detector broadband energy channels. Each colour indicates a different energy 
channel, red: 20 — 60 keV, yellow: 60 — 110 keV, green: 110 — 320 keV, blue: 320 — 2,000 keV. d) The autocorrelation 
function for each light curve channel. The shaded regions delineate 3 — o deviance from the Savitzky-Golay fits, which 
are omitted for clarity. 
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Figure 12: The gravitational lens parameter posterior distributions for a model fit to GRB 911031 for 
each of the 4 BATSE large area detector broadband energy channels. Each colour indicates a different 
energy channel, red: 20 — 60 keV, yellow: 60— 110 keV, green: 110 — 320 keV, blue: 320 -— 2, 000 keV. Contours contain 
39.3%. 86.4%, and 98.9% of the probability density. The light curve of GRB 911031 is shown in Figure [1] 
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